Week 06: Sentiment Analysis with AI
Using API to AI to create data from text
Week 06: Sentiment Analysis with AI
Using API to AI to create data from text
Overview
Continue using text for research with AI
Learning Outcomes
By the end of the session, students will:
- Gain hands-on experience with sentiment analysis.
- Have experience integrating NLP in research
- Think about what is ground truth
Assignment review
Review of transformation options
- lexicon based counting numbers –> generate a transparent script
- Machine-learning classifiers, n-grams etc.
- LLM (transformer-based) one-shot: treating the LLM like a giant classifier: you hand it raw text and ask classes of sentiment. (Deep contextual understanding—word embeddings, attention across the whole sentence etc decide the sentiment.) No separate sentiment lexicon; it’s all encoded in the model weights.
- You can also take a pretrained LLM and continue training it on thousands of labeled review. See our example guidelines HERE
Let us discuss the concept ground truth (again): AI vs humans vs “truth”
Using APIs
Basics and setup
Examples
Simple walkthrough with GDP data – uses World Bank and FRED APIs
Bit harder walkthrough with football data – uses FBREF soccer data. Guess the club for example.
More advanced stuff
More advanced knowledge on APIs – how APIs work
Materials
Datasets
- texts (text_id level)
- games info (such as results, text_id level)
- class-ratings (human, AI ratio, text_id*student level)
- domain-rating (text_id level)
- class-rating-aggregated (text_id level)
code
Interview Analysis
Code for sentiment analysis of football manager interviews, here: interview scripts
- Domain lexicon creation in R
- script in R requires API key setup in R
- Script in Pythin – make sure you set your own API key
Preparation
- Download the combined data from Moodle
- Note: win, draw – need encode loss
Class tasks
Discussion 1
- Your experience regarding human vs ai ratings.
- What was difficult and easy as human rater
Data Analysis
- Take the aggregated file and ask AI for a readme. Discuss what is in the data
- Compare human, domain lexicon and AI rating. For human and AI take the average.
- Think of an interesting comparison using AI rating
- Compare results by human and lexicon rating
Discussion 2
- What is ground truth
How to integrate AI into research
- combine data with text
- think RQ and how you’d use AI
Additional tasks if time permits
predict gender and result
- Show AI all texts and ask to predict the gender of speaker
- Show AI all texts and ask to predict the result (manager’s team won, drew, lost)